Annotators' Agreement: The Case of Topic-Focus Articulation

نویسندگان

  • Katerina Vesela
  • Jirí Havelka
  • Eva Hajicová
چکیده

The annotation of the Prague Dependency Treebank (PDT) is conceived of as a multilayered scenario that comprises also dependency representations (tectogrammatical tree structures, TGTS’s) of the underlying structure of the sentences. TGTS’s capture three basic aspects of the underlying structure of sentences: (a) the dependency tree structure, (b) the kinds of dependency syntactic relations, and (c) the basic characteristics of the topic-focus articulation (TFA). Since the PDT is a large collection and the annotations on the deepest layer are to a large extent performed by several human annotators (based on an automatic preprocessing module), it is more than necessary to observe the consistence of annotators and the agreement among them. In the present paper, we summarize the results of the evaluation of parallel annotations of several samples taken from PDT and the measures accepted to improve the consistency of annotations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus Annotation on the Tectogrammatical Layer: Summarizing of the First Stages of Evaluations

We summarize here the results of a series of evaluations of the annotators’ assignments of tectogrammatical (i.e. underlying syntactic) tree structures and of the values of the edges as well as the values of the attribute representing the topic-focus articulation of the sentences, within the large-scale project of the Prague Dependency Treebank.

متن کامل

What can linguists learn from some simple statistics on annotated treebanks

The goal of the present contribution is rather modest: to collect simple statistics carried out on different layers of the annotation scenario of the Prague Dependency Treebank (PDT; [1]) in order to illustrate their usefulness for linguistic research, either by supporting existing hypotheses or suggesting new research questions or new explanations of the existing ones. For this purpose, we hav...

متن کامل

The Prague Dependency Treebank: Crossing the Sentence Boundary

The units processed by tagging procedures both automatic and manual are sentences (as occurring in the texts in the corpus), but the human annotators are instructed to assign (disambiguated) structures according to the meaning of the sentence in its environment, taking contextual (and factual) information into account. We focus in the paper on two issues: how to capture (i) the topic-focus arti...

متن کامل

Let's Agree to Disagree: Measuring Agreement between Annotators for Opinion Mining Task

There is a need to know up to what degree humans can agree when classifying a sentence as carrying some sentiment orientation. However, a little research has been done on assessing the agreement between annotators for the different opinion mining tasks. In this work we present an assessment of agreement between two human annotators. The task was to manually classify newspaper sentences into one...

متن کامل

Comparing transcription agreement on non-native English speech corpus between native and non-native annotators

This paper aims to compare transcription agreement on nonnative English speech corpus spoken by Korean learners between native and non-native annotators. Ten non-native annotators and three native annotators participate in the transcription of 608 sentences. All annotators are provided with forced-aligned phone sequences, which are to be corrected in case when they are realized differently. The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004